Presentation: Tweet"Solving classical data analytic task by using modern distributed databases"
NoSQL databases have a limited query languages that are not suitable for analytical request. The classical solution provided by most of them is a Hadoop integration. That is not fast. Thus a number of fast distributed, parallel query/computation engines appears recently to fix Hadoop performance problems.
The presentation will show how to solve classical data analytic task by using modern distributed databases and in-memory engines using as example Spark and Cassandra. It will cover following topics:
- Apache Spark benefits, architecture and Scala API. (Don't be afraid of Scala, we are here to help you)
- Load and store data from Cassandra NoSQL database
- Data enrichments and joins
- Spark Machine learning and graph algorithms
Target audience: Software engineers and solution architects using or planning to use NoSql products for analytics, particularly Cassandra and Spark.